SemanticScuttle - klotz.me » Tags: machine learning+clustering+python

Tags: machine learning* + clustering* + python*

0 bookmark(s) - Sort by: Date ↓ / Title /

I Was Wrong: Start Simple, Then Move to More Complex

The author discusses a shift in approach to clustering mixed data, advocating for starting with the simpler Gower distance metric before resorting to more complex embedding techniques like UMAP. They introduce 'Gower Express', an optimized and accelerated implementation of Gower.

2025-09-05 Tags: clustering, data science, machine learning, gower distance, umap, gower express, mixed data, python, scikit-learn, data analysis, shrunk by klotz

Demo of DBSCAN clustering algorithm

This example demonstrates Density-Based Spatial Clustering of Applications with Noise (DBSCAN) using scikit-learn, showing how to generate synthetic clusters, compute DBSCAN clustering, and visualize the results, including core and non-core samples.

2025-04-18 Tags: dbscan, clustering, scikit-learn, machine learning, data mining, python, visualization by klotz

ASCVIT V1: Automatic Statistical Calculation, Visualization, and Interpretation Tool

ASCVIT V1 aims to make data analysis easier by automating statistical calculations, visualizations, and interpretations.

Includes descriptive statistics, hypothesis tests, regression, time series analysis, clustering, and LLM-powered data interpretation.

- Accepts CSV or Excel files. Provides a data overview including summary statistics, variable types, and data points.
- Histograms, boxplots, pairplots, correlation matrices.
- t-tests, ANOVA, chi-square test.
- Linear, logistic, and multivariate regression.
- Time series analysis.
- k-means, hierarchical clustering, DBSCAN.

Integrates with an LLM (large language model) via Ollama for automated interpretation of statistical results.

2024-09-17 Tags: foss, ascvit, statistical analysis, data visualization, llm, python, streamlit, machine learning, statistics, regression, time series, clustering, eda by klotz

A Guide to Clustering Algorithms

An overview of clustering algorithms, including centroid-based (K-Means, K-Means++), density-based (DBSCAN), hierarchical, and distribution-based clustering. The article explains how each type works, its pros and cons, provides code examples, and discusses use cases.

2024-09-06 Tags: clustering, unsupervised learning, machine learning, data science, python, k-means, k-means++, dbscan, hierarchical clustering, distribution based clustering by klotz

DBSCAN, Explained in 5 Minutes

A simple and intuitive explanation of DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that can identify outliers, extract new features, compress data, and perform novelty detection. The article provides a fast implementation of DBSCAN in Python.

2024-08-25 Tags: dbscan, clustering, machine learning, python, density, spatial by klotz

Stop Using Elbow Method in K-means Clustering, Instead, Use this! | by Anmol Tomar | Towards Data Science

Elbow curve and Silhouette plots both are very useful techniques for finding the optimal K for K-means clustering

2023-02-13 Tags: elbow, silhouette, optimization, k for k-means, clustering, machine learning, python by klotz

Scikit Learn - Clustering Methods

Comparing Clustering Algorithms
Following table will give a comparison (based on parameters, scalability and metric) of the clustering algorithms in scikit-learn.

Sr.No Algorithm Name Parameters Scalability Metric Used
1 K-Means No. of clusters Very large n_samples The distance between points.
2 Affinity Propagation Damping It’s not scalable with n_samples Graph Distance
3 Mean-Shift Bandwidth It’s not scalable with n_samples. The distance between points.
4 Spectral Clustering No.of clusters Medium level of scalability with n_samples. Small level of scalability with n_clusters. Graph Distance
5 Hierarchical Clustering Distance threshold or No.of clusters Large n_samples Large n_clusters The distance between points.
6 DBSCAN Size of neighborhood Very large n_samples and medium n_clusters. Nearest point distance
7 OPTICS Minimum cluster membership Very large n_samples and large n_clusters. The distance between points.
8 BIRCH Threshold, Branching factor Large n_samples Large n_clusters The Euclidean distance between points.

2021-10-29 Tags: machine learning, clustering, scikit-learn, python, tutorial, cheatsheet by klotz

A Fresh Look at Clustering Algorithms | by Dmitry Selemir | Towards Data Science

2021-10-24 Tags: clustering, python, machine learning, numpy by klotz

Hierarchical Clustering on Categorical Data in R - Towards Data Science

2019-10-10 Tags: hierarchical clustering, clustering, machine learning, python, categorical data by klotz

K-Means & Other Clustering Algorithms: A Quick Intro with Python – LearnDataSci